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O ■ Abstract 

(N 

We initiate the theoretical study of the problem of minimizing the size of an iBGP overlay in 
^ , an Autonomous System (AS) in the Internet subject to a natural notion of correctness derived 

from the standard "hot-potato" routing rules. For both natural versions of the problem (where 

^N , we measure the size of an overlay by either the number of edges or the maximum degree) we 

prove that it is NP-hard to approximate to a factor better than f2(logn) and provide approxi- 
mation algorithms with ratio 0{y^). This algorithm is based on a natural LP relaxation and 

^0 ' randomized rounding technique inspired by the recent work on approximating directed spanners 

~ ' by Bhattacharyya et al. [SODA 2009], Dinitz and Krauthgamer [STOC 2011], and Herman et 

jyl ' al. [ICALP 2011]. In addition to this theoretical algorithm, we give a slightly worse 0{n?^^)- 

O . approximation based on primal-dual techniques that has the virtue of being both fast (in theory 

and in practice) and good in practice, which we show via simulations on the actual topologies 
of five large Autonomous Systems. 
^ ' The main technique we use is a reduction to a new connectivity-based network design problem 

0^ , that we call Constrained Connectivity. In this problem we are given a graph G = {V,E), and 

for every pair of vertices u,v € V we are given a set S{u, v) (ZV called the safe set of the pair. 
The goal is to find the smallest subgraph H ~ {V, F) of G in which every pair of vertices u, v is 
connected by a path contained in S{u, v). We show that the iBGP problem can be reduced to the 
special case of Constrained Connectivity where G = Kn and safe sets are defined geometrically 
based on the IGP distances in the AS. Indeed, our algorithmic upper bounds generalize to 
Constrained Connectivity on Kn, and our i7(logn)-lowcr bound for the special case of iBGP 
implies hardness for the general case. Furthermore, we believe that Constrained Connectivity is 
an interesting problem in its own right, so provide stronger hardness results (2'°s "-hardness 
of approximation based on reductions from Label Cover) and integrality gaps {in}/^~'^ based on 
random instances of Unique Games) for the general case. On the positive side, we show that 
Constrained Connectivity turns out to be much simpler for some interesting special cases other 
than iBGP: when safe sets are symmetric and hierarchical, we give a polynomial time algorithm 
that computes an optimal solution. 
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1 Introduction 

The Internet consists of a number of interconnected subnetworks called Autonomous Systems 
(ASes). As described in [^, the way that routes to a given destination are chosen by routers 
within an AS can be viewed as follows. Routers have a ranking of routes based on economic con- 
siderations of the AS. Without loss of generality, in what follows we assume that all routes are 
equally ranked. Thus routers must use some tie-breaking scheme in order to choose a route from 
amongst the equally ranked routes. Tie-breaking is based on traffic engineering considerations and 
in particular, the goal is to get packets out of the AS as quickly as possible (called hot-potato 
routing) . 

An AS attempts to achieve hot-potato routing using iBGP, the version of the interdomain 



routing protocol BGP [17| used by routers within a subnetwork to announce routes to each other 
that have been learned from outside the subnetwork. An iBGP configuration is defined by a 
signaling graph, which is supposed to enforce hot-potato routing. Unfortunately, while iBGP has 
many nice properties that make it useful in practice, constructing a good signaling graph turns 
out to be a computationally difficult problem. For example, it is not clear a priori that it is even 
possible to check in polynomial time that a signaling graph is correct, i.e. it is not obvious that 
the problem is even in NP! In this paper we study the problem of constructing small and correct 
signaling graphs, as well as a natural extension to a more general problem that we call Constrained 
Connectivity. 

1.1 iBGP 

At a high level, iBGP works as follows. The routers that initially know of a route are called border 
routers. (These initial routes are those learned by the border routers from routers outside the AS.) 
The border router that initially knows of a route is said to be the egress router of that route. Each 
border router knows of at most one route. Thus an initial set of routes F defines a set of egress 
routers Xp where there is a one-to-one relationship between routes in F and routers in Xp. The 
AS has an underlying physical network with edge weights (e.g., IGP distances or OSPF weights). 
The distance between two routers is then defined to be the length of the shortest path (according 
to the edge weights) between them. Given a set of routes, a router will rank highest the one whose 
egress router is closest according to this definition of distance. The signaling graph H is an overlay 
network whose nodes represent routers and whose edges represent the fact that the two routers at 
its endpoints use iBGP to inform one another of their current chosen route. The endpoints of an 
edge in H are called iBCP neighbors. A path in H is called a signaling path. Note that iBGP 
neighbors are not necessarily neighbors in the underlying graph, since H is an overlay and can 
include any possible edge. 

Finally, iBGP can be thought of as working as follows: in an asynchronous fashion, each router 
considers all the latest routes it has heard about from its iBGP neighbors, chooses the one with 
the closest egress router and tells its iBGP neighbors about the route it has chosen. This continues 
until no router learns of a route whose egress router is closer than that of its currently chosen route. 
When this process ends the route chosen by router r is denoted by R{r). Let P{r) be the shortest 
path from r to E{r), the egress router of R{r). When a packet arrives at r, it sends it to the next 
router r' on P{r), r' in turn sends the packet to the next router on P(r') and so on. Thus if P(r') 
is not the subpath of P{r) starting at r' then the packet will not get routed as r expected. 

A signaling graph H has the complete visibility property for a set of egress routers Xp if each 



router r hears about (and hence chooses as R{r)) the route in F whose egress router E(r) is closest 
to r from amongst all routers in Xp. It is easy to see that H will achieve hot-potato routing for 
Xp if and only if it has the complete visibility property for Xp. So we say that a signaling graph 
is correct if it has the complete visibility property for all possible Xp. 

Clearly if H is the complete graph then H is correct. Because of this, the default configuration 
of iBGP and the original standard was to maintain a complete graph, also called a full mesh |17|. 



However the complete graph is not practical and so network managers have adopted various con- 
figuration techniques to reduce the size of the signaling graph [^, |T^ . Unfortunately these methods 
do not guarantee correct signaling graphs [0, |Tl|. Thus our goal is to determine correct signaling 
graphs with fewer edges than the complete graph. Slightly more formally, two natural questions 
are to minimize the number of edges in the signaling graph or to minimize the maximum number 
of iBGP neighbors for any router while guaranteeing correctness. We define iBGP-SuM to be 
the problem of finding a correct signaling graph with the fewest edges, and similarly we define 
iBGP-Degree to be the problem of finding a correct signaling graph with the minimum possible 
maximum degree. 

1.2 Constrained Connectivity 

All we know a priori about the complexity of iBGP-Sum and iBGP-Degree is that they are in 
S2 (the second existential level of the polynomial hierarchy), since the statement of correctness is 
that "there exists a small graph H such that for all possible subsets Xp each router hears about 
the route with the closest egress router". In particular, it is not obvious that these problems are in 
NP, i.e. that there is a short certificate that a signaling graph is correct. However, it turns out that 



these problems are actually in NP (see Section 2A ) , and the proof of this fact naturally gives rise 



to a more general network design problem that we call Constrained Connectivity. In this problem 
we are given a graph G = {V, E) and for each pair of nodes {u,v) € V x V we are given a set 
S{u,v) C V. Each such S{u,v) is called a safe set and it is assumed that u,v £ S{u,v). We say 
that a subgraph H = {V, F) of G is safely connected if for each pair of nodes {u, v) there is a path 
in H from u to v where each node in the path is in S{u, v). As with iBGP, we are interested in two 
optimization versions of this problem: 

1. Constrained Connectivity-Sum: compute a safely connected subgraph H with the min- 
imum number of edges, and 

2. Constrained Connectivity-Degree: compute a safely connected subgraph H that min- 
imizes the maximum degree over all nodes. 



It turns out (see Theorem |2.1| ) that the iBGP problems can be viewed as Constrained Con- 
nectivity problems with G = Kn and safe sets defined in a particular geometric way. While the 
motivation for studying Constrained Connectivity comes from iBGP, we believe that it is an inter- 
esting problem in its own right. It is an extremely natural and general network design problem that, 
somewhat surprisingly, seems to have not been considered before. While we only provide negative 
results for the general problem (hardness of approximation and integrality gaps), a better under- 
standing of Constrained Connectivity might lead to a better understanding of other network design 
problems, both explicitly via reductions and implicitly through techniques. For example, many of 
the techniques used in this paper come from recent literature on directed spanners 0, §, 0], and 



given these similarities it is not unreasonable to think that insight into Constrained Connectivity 
might provide insight into directed spanners. 

For a more direct example, there is a natural security application of Constrained Connectivity. 
Suppose we have n players who wish to communicate with each other but they do not all trust one 
another with messages they send to others. That is, when u wishes to send a message to v there is a 
subset S{u, v) of players that it trusts to see the messages that it sends to v. Of course, if for every 
pair of players there were direct communication channels between the two players, then there would 
be no problem. But suppose there is a cost to protect communication channels from eavesdropping 
or other such attacks. Then a goal would be to have a network of fewer than 0{'n?) communication 
channels that would still allow a route from each u to each v with the route completely contained 
within S{u,v). Thus this problem defines a CONSTRAINED Connectivity-Sum problem. 

1.3 Summary of Main Results 

In Section y we give a polynomial approximation for the iBGP problems, by giving the same 
approximations for the more general problem of Constrained Connectivity on K^. 



Theorem 3.4. There is an 0(Y/n)-approximation to the Constrained Connectivity problems 
on Kn- 

Corollary. There is an 0(Y^)-approximation to iBGP-SuM and iBGP-Degree. 

To go along with these theoretical upper bounds, we design a different (but related) algorithm 
for Constrained Connectivity-Sum on Kn that provides a worse theoretical upper bound (a 
0(n^''^)-approximation) but is faster in both practice and theory, and show by simulation on five 
real AS topologies (Telstra, Sprint, NTT, TINET, and Level 3) that in practice it provides an 



extremely good approximation. Details of these simulations are in Section 3.3 

To complement these upper bounds, in Section Q we show that the iBGP problem is hard to 
approximate, even with the extra power afforded us by the geometry of the safe sets: 



Theorems [4.5| and |4.4| . It is NP-hard to approximate iBGP-SuM or iBGP-Degree to a 
factor better than Q{logn). 

We then study the more general Constrained Connectivity problems, and in Section ^ we show 
that the fully general constrained connectivity problems are hard to approximate: 



Theorem |5.2| . The Constrained Connectivity-Sum and Constrained Connectivity- 
Degree problems do not admit a 2'°s "-approximation algorithm for any constant e > unless 
NP C DTIME(nP°^y'°s(")) 

This is basically the same inapproximability factor as for Label Cover, and in fact our reduction 
is from a minimization version of Label Cover known as Min-Rep. Moreover, we show that the 
natural LP relaxation has a polynomial integrality gap of 0,{n^^''). 

Finally, in Section |^ we consider some other special cases of Constrained Connectivity that turn 
out to be easier. In particular, we say that a collection of safe sets is symmetric if S{x, y) = S{y, x) 
for all x,y G V and that it is hierarchical if for all x,y,z £ V, ii z £ S{x, y) then S{x, z) C S{x, y) 
and S{z,y) C S{x,y). It turns out that all of our hardness results and integrality gaps also hold 
for symmetric instances, but adding the hierarchical property makes things easier: 



Theorem |6.6|. Constrained Connectivity- Sum with symmetric and hierarchical safe sets 



can be solved optimally in polynomial time. 



1.4 Related Work 



Issues involving eBGP, the version of BGP that routers in different ASes use to announce routes 
to one another, have recently received significant attention from the theoretical computer science 



community, especially stability and game-theoretic issues (e.g., [10, 14, y). However, not nearly as 
much work has been done on problems related to iBGP which distributes routes internally in an AS. 
There has been some work on the problem of guaranteeing hot-potato routing in any AS with a route 
reflector architecture y] . These earlier papers did not consider the issue of finding small signaling 
graphs that achieved the hot-potato goal. Instead they either provided sufficient conditions for 



correctness relating the underlying physical network with the route reflector configuration |11] or 
they showed that by allowing some specific extra routes to be announced (rather than just the 
one chosen route) they would guarantee a version of hot-potato routing [||]. The first people to 
consider the problem of designing small iBGP overlays subject to achieving hot-potato correctness 
were Vutukuru et al. [^], who used graph graph partitioning schemes to give such configurations. 
But while they proved that their algorithm gave correct configurations, they only gave simulated 
evidence that the configurations it produced were small. Buob et al. [y] considered the problem 
of designing small correct solutions and gave a mathematical programming formulation, but then 
simply solved the integer program user super-polynomial time algorithms. 

2 Preliminaries 

2.1 Relationship between iBGP and Constrained Connectivity 

We win now show that the iBGP problems are just special cases of Constrained Connectivity- 
Sum and Constrained Connectivity-Degree. This wih be a natural consequence of the proof 
that IBGP-SUM and iBGP-Degree are in NP. 

To see this we will need the following definitions. We will assume that there are no ties, i.e. all 
distances are distinct. For two routers x and y, let D{x,y) = {w : d{x,w) > d{x,y)} be the set of 
routers that are farther from x than y is. Let S{x,y) = {w : d{w,y) < d{'w, D{x,y))} U {y} be the 
set of routers that are closer to y than to any router not in the ball around x of radius d{x, y). We 
will refer to S{x,y) as "safe" routers for the pair {x,y). A path from x to y in a signaling graph 
is said to be a safe signaling path if it is contained in S{x,y). It turns out that these safe sets 
characterize correct signaling graphs: 

Theorem 2.1. An iBGP signaling graph H is correct if and only if for every pair {x,y) &V x V 
there is a signaling path from y to x that uses only routers in S{x,y). 

Proof. We first show that if every pair has a safe signaling path then every node hears about the 
route that has the closest egress router no matter what the set of egress routers Xp is. This is 
simple: let x be a router, and let y be its closest egress router. Let r be the route whose egress 
router is y. By assumption there is a signaling path from y to x that uses only routers in S{x,y). 
By definition, every one of these routers is closer to y than to any router farther from x than y 
is. Since y is the closest egress to x, this means that for all of the routers in S{x,y), y will be the 



closest egress router. A simple induction then shows that the routers in a safe signaling path will 
each choose r and hence tell their iBGP neighbor in the path about r. That is, x hears about r. 

For the other direction we need to show that if a signaling graph is correct then every pair has 
a safe signaling path. For contradiction, suppose that there is no safe signaling path from y to 
X. Let Xp, the set of egress routers, be D{x,y) U {y}. Let r be the route whose egress router is 
y. Since every router in D{x,y) is farther from x than y is, this means that for this set of egress 
routers x is closer to y than any other egress. By correctness we know that x does hear about y. 
Let y = ai,a2, ■ ■ ■ ,ak = X he the (or at least a) signaling path from y to x through which x hears 
about r. Since there are no safe signaling paths from y to x, we know that there exists some i such 
that a, S{x,y). This means that there is some w £ D{x,y) such that d{ai,w) < d{ai,y). Since 
we assumed correctness we know that Oj heard about the route with the closest egress router z 
to Oj, and z ^ y (since w in particular is closer). So Oj will not tell its iBGP neighbors about r, 
which is a contradiction since Oj is on the signaling path from which x heard about r. Thus a safe 
signaling path must exist. D 

Note that this condition is easy to check in polynomial time, so we have shown membership 
in NP. Also this characterization shows that the problems iBGP-SUM and iBGP-Degree are 
Constrained Connectivity problems where the underlying graph G is Kn and the safe sets are 
defined by certain geometric properties. While the proof of this is obviously relatively simple, 
we believe that it is an important contribution of this paper as it allows us to characterize the 
behavior of a protocol (iBGP) using only the static information of the signaling graph and the 
network distances. 



2.2 Linear Programming Relcixations 

There are two obvious linear programming relaxations of the CONSTRAINED CONNECTIVITY prob- 
lems (and thus the iBGP problems): the flow LP and the cut LP. For every pair {u,v) £ V x V 
let Vuv be the collection of u — v paths that are contained in S(u,v). The flow LP has a variable 
Ce for every edge e £ E (called the capacity of edge e) and a variable f{P) for every u — v path in 
Vuv for every {u,v) G V xV (called the flow assigned to path P). The flow LP simply requires 
that at least one unit of flow is sent between all pairs while obeying capacity constraints: 



min X^e Ce 

Z-jP^Vuv-edP f\P) — Ce 
< Ce < 1 
< f{P) < 1 



y{u,v) £V xv 

yeeE,{u,v) eV xV 

ye€E 

V(n, v)£VxV,P £Vuv 



This is obviously a vahd relaxation of Constrained Connectivity-Sum: given a vahd solu- 
tion to Constrained Connectivity-Sum, let Puv denote the required safe u — v path for every 
(u, v) £ VxV. For every edge e in some Puv set Ce to 1, and set f{Puv) to 1 for every {u, v) £ V xV. 
This is clearly a valid solution to the linear program with the exact same value. To change the 
LP for Constrained Connectivity-Degree we can just introduce a new variable A, change the 
objective function to min A, and add the extra constraints X^^.r^ u}eE '^{u,v} ^ '^ for all u £ V. And 
while this LP can be exponential in size (since there is a variable for every path), it is also easy 
to design a compact representation that has only 0{n^) variables and constraints. This compact 



representation has variables // ■> instead of f{P), where // % represents the amount of flow from 
u to V along edge {u,v} for the demand {x,y). Then we can write the normal flow conservation 
and capacity constraints for every demand {x,y) independently, restricted to S{x,y). Indeed, this 
compact representation is one of the main reasons to prefer the flow LP over the cut LP. 

The cut LP is basically equivalent to the flow LP, except that instead of requiring flow to be 
sent, it requires the min-cut to be large large enough. Given a pair {u,v) gVxV, let S{u,v) = 
{S C S{u, v) : u & S Av ^ S} he the collection of safe set cuts that separate u and v. Furthermore, 
given a set 5 € S{u,v) let 6uv{S) = {e G (2) : e € {S,S{u,v) \ S)} be the set of safe edges that 
cross S. The cut LP has a variable Xe for every edge e (equivalent to Ce in the flow LP), and is 
quite simple: 

min y Xe 

e 

s.t. 2~\ Xe> I Vn, V £ V, S € S{u, v) 

e£Suv{S) 

This LP simply minimizes the sum of the edge variables subject to the constraint that for every 
cut between two nodes there must be at least one safe edge crossing it. While the flow LP and 
the cut LP are not technically duals of each other (since capacities are variables), it is easy to see 
from the max flow-min cut theorem that they do in fact describe the same polytope (with respect 
to the capacity variables). Thus integrality gaps for one automatically hold for the other, as do 
approximations achieved by LP rounding. 

3 Algorithms for iBGP and Constrained Connectivity on Kn 

3.1 0(A/n)-approximation 

In this section we show that there is a 0(-^/ri)-approximation algorithm for both Constrained 
Connectivity problems as long as the underlying graph is the complete graph Kn. This algorithm 
is inspired by the recent progress on directed spanners by Bhattacharyya et al. Q, Dinitz and 
Krauthgamer Q, and Berman et al. y]. In particular, we use the same two-component framework 
that they do: a randomized rounding of the LP and a separate random tree-sampling step. The 
randomized rounding we do is simple independent rounding with inflated probabilities. The next 
lemma implies that this works well when the safe sets are small. 

Lemma 3.1. Let E' (^ E be obtained by adding every edge e £ E to E' independently with 
probability at least min{12ce • \S{x,y)\ Inn, 1}. Then with probability at least 1 — 1/n^ , E' will have 
a path between x and y contained in S{x,y). 

Proof. Let {X, Y) be a partition of S{x, y) so that x £ X and y £ Y, i.e. {X, Y) is an x — y cut 
of S{x,y). Note that there are only 21^^'^^' such cuts, and by standard arguments if at least one 
edge from every cut is chosen to be in E' then E' contains an x — y path in S{x,y). Since in 
any LP solution at least one unit of flow is sent from x to y in S{x,y), every cut has capacity at 
least 1. Let 6{X,Y) be the set of edges that cross the cut {X,Y). If Ce > l/{12\N{x,y)\ Inn) for 
any e S S{X,Y) then e is selected with probability 1, and thus {X,Y) is spanned. Otherwise, the 
probability that no edge from d{X,Y) is chosen is at most neG5(xy)(-'- ~ -'-^'^e • \S{x,y)\lnn) < 



exp(— 12|5(x, y)\ In n • "^^^^(x y) '-e) ^ e 3|S(x,y)| inn_ 'pj-^^jg ]-,y g^ simple union bound the probability 
that we fail on any cut is at most 2l"5(^.y)le-i2|S'(^'.J/)|in" < (2/e)-i2|'S(^'.s/)|inn < i/^^ □ 

Another important part of our algorithm will be random sampling that is independent of the LP. 
We will use two different types of sampling: star sampling for the sum version and edge sampling 
for the degree version. First we consider star sampling, in which we independently sample nodes 
with probability p, and every sampled node becomes the center of a star that spans the vertex set. 

Lemma 3.2. All pairs with safe sets of size at least s will be satisfied by random star sampling 
with high probability if p = 3 In n/s. 

Proof. Consider some pair {x,y) with \S{x,y)\ > s. If some node (say z) from S{x,y) is sampled 
then the pair is satisfied, since the creation of a star at z would create a path x — z — y that would 
satisfy {x,y}. The probability that no node from S{x,y) is sampled is 

(1 _p)|5(^',s/)l < [I-pY < e-P' = e-^^''^" = l/n^ 

Since there are less than n^ pairs, we can take a union bound over all pairs {x, y) with \S{x, y)\ > s, 
giving us that all such pairs are satisfied with probability at least 1 — 1/n. D 

For edge sampling, we essentially consider the Erdos-Renyi graph Gn^p, i-e. we just sample every 
edge independently with probability p. We will actually consider the union of 3 log n independent 
Gn,p graphs, where p = - — "^^ °^^ for some small e > 0. Let H be this random graph. 

Lemma 3.3. With probability at least 1 — 1/??,, all pairs with safe sets of size at least s will be 
connected by a safe path in H. 

Proof. Let {x,y) be a pair with \S{x,y)\ > s. Obviously {x,y) is satisfied if the graph induced 
on S{x, y) is connected. It is known Q that there is some small e with < e < 1 so that Gg^p is 
connected with probability at least 1/2. Since H is the union of 31ogn instantiations of G„^p, we 
know that the probability that the subgraph of H induced on S{x, y) is not connected is at most 
1/n^. We can now take a union bound over all such (x, y) pairs, giving us that the probability that 
there is some unsatisfied {x,y) pairs with \S{x,y)\ > s is at most 1/n. D 

We will now combine the randomized rounding of the LP and the random sampling into a single 
approximation algorithm. Our algorithm is divided into two phases: first, we solve the LP and 



randomly include every edge e with probability 0(ce-v/ralnn). By Lemma 3.1 this takes care of safe 
sets of size at most \/n. Second, if the objective is to minimize the number of edges we do star 
sampling with probability (3 In n)/-^/n, and if the objective is to minimize the maximum degree we 
do edge sampling using the construction of Lemma ^^ with s = y/n. It is easy to see that this 
algorithm with high probability results in a valid solution that is a 0(-^/n)-approximation. 

Theorem 3.4. This algorithm is a 0{^/n)- approximation to both CONSTRAINED CONNECTIVITY- 
SuM and Constrained Connectivity-Degree on Kn. 

Proof. We first argue that the algorithm does indeed give a valid solution to the problem. Let 



{x,y) be an arbitrary pair. If \S{x.,y)\ < i/n, then Lemma 3T implies that the first phase of the 
algorithm results in a safe path. If \S{x,y)\ > -^/n, then Lemma 3.2 or Lemma ^^ imply that the 
second phase of the algorithm results in a safe path. So every pair has a safe path, and thus the 
solution is valid. 

7 



We now show that the cost of this algorithm is at most 0{^/n) x OPT. We first consider the 
objective function of minimizing the number of edges. In the LP rounding step we only increase 
capacities by at most a factor of 0{y/n), so since the LP is a relaxation of the problem we know 
that the expected cost cost of the rounding is at most 0{^/n) x OPT. For phase 2, in expectation 
we chose 3^/nlnn stars, for a total of at most 3?i^'^ln?i edges. But since there is a demand for 
every pair we know that OPT > n — 1, so phase 2 has total cost at most 0{y/n) x OPT. 

If instead our objective function is to minimize the maximum degree, then since phase 1 only 
increases capacities by 0{^/n) we know that after phase 1 the maximum degree is at most 0{^/n) x 
OPT (by a Chernoff bound, with high probability every vertex has degree at most 0{^/n) times its 
fractional degree in the LP). In phase 2, a simple Chernoff bound implies that with high probability 
every node gets 0{^/n) new edges, and thus the node with maximum degree still has degree at most 
O(Vra) X OPT. D 

3.2 Primal-Dual Algorithm 

We also have a primal-dual algorithm that gives a slightly worse result for the Constrained 
Connectivity-Sum problem. While this algorithm and its analysis is slightly more complicated 
and only works for the Sum version, by not solving the linear program we get a faster algorithm. 
In particular, the best known algorithms for solving linear programs with m variables take Q{m'^'^) 
time on general LPs, so since there are n^ variables in the compact version of the flow LP this 
takes il(n^^'^) time. The primal-dual algorithm, on the other hand, is significantly faster: a naive 
analysis shows that it takes 0{n^) time. 

In this algorithm we use the cut LP rather than the flow LP (in fact, the algorithm is very 
similar to the primal-dual algorithm for Steiner Forest, which uses a similar cut LP but doesn't 
have to deal with safe sets). Since this is a primal-dual algorithm, instead of solving and rounding 
the cut LP we will consider the dual, which has a variable y™ for every pair {u, v) and S G S{u, v). 
We say the an edge e € S{u, v) if both endpoints of e are in S{u, v). 

max Y. Yl ys" 

u,v(^V S£S{u,v) 

u,v£V:e&S{u,v) Se5(u,-u):ee(5„„(5) ^ ^ 

Unfortunately we will not be able to use a pure primal-dual approximation, but will have to 
trade off with a random sampling scheme as in the rounding algorithm. So instead of this primal, 
we will only have constraints for u,v £ V with \S{u,v)\ < t for some parameter t that we will 
set later. Thus in the dual we will only have variables y"^ for {u,v) with [^(ti,!;)! < t. This 
clearly preserves the property that the primal is a valid relaxation of the actual problem. Let 
D = {{u,v):\S{u,v)\<t}. 

Our primal-dual algorithm, like most primal-dual algorithms, maintains a set of active dual 
variables that it increases until some dual constraint becomes tight. Once that happens we buy an 
edge (i.e. set some Xe to 1 in the primal), change the set of active dual variables, and repeat. We 
do this until we have a feasible primal. 

Initially our primal solution H is empty and the active dual variables are y^^^ for every (n, v) G 
D, i.e. every node u has an active dual variable for every other v that it has a demand with 
corresponding to the cut in S{u,v) that is the singleton {u}. We raise these variables uniformly 



until some constraint (say the one for e = {w,z}) becomes tight. At this point we add e to our 
current primal solution H. We now change the active dual variables by "merging" moats that 
cross e. In particular, there are some active variables {y™} where e G Suv{S) (which implies that 
w,z £ S{u,v) as well). Let H\s(u,v) denote the subgraph of H induced on S{u,v). Without loss 
of generality we can assume that w £ S and z ^ S. Let T C S{u, v) be the connected component 
of H\s(^y_^y\ containing z. We now make y™' inactive, and make Vg^rp active. We do this for all 
such active variables, and then repeat this process (incrementing all dual variables until some dual 
constraint becomes tight, adding that edge to H, and then merging moats that cross it) until all 
pairs {u, v) £ D have a safe path in H. 

Lemma 3.5. This algorithm always maintains a feasible dual solution and an active set that does 
not contribute to any tight constraint. 

Proof. We will show this by induction, where the inductive hypothesis is that the dual solution 
is feasible and that no dual variables that contribute to a tight constraint are active. Initially all 
dual variable are 0, so it is obviously a feasible solution and no constraints are tight. Now suppose 
this is true after we add some edge e' . We need to show that it is also true after we add the next 
edge e = {w, z}. By induction the dual solution after we added e' is feasible and none of the active 
dual variables contribute to any tight constraints. Thus raising the active dual variables until some 
constraint becomes tight maintains dual feasibility. 

To prove that no active variables contribute to a tight constraint, note that the only new tight 
constraint is the one corresponding to e. The only variables contributing to that constraint are of 
the form y™ where e G 6uviS). But our algorithm made all of these variables inactive, and only 
added new active variables for sets 5" that contain both w and z and thus do not contribute to the 
newly tight constraint. Furthermore, these sets S' are formed by the union of S and the connected 
component in H\s(u,v) containing the other endpoint, so no newly active variable contributes to a 
constraints that became tight previously (since they correspond to edges in H). D 

Theorem 3.6. The primal-dual algorithm returns a graph H with at most 0{t^) x OPT edges in 
which every pair {u,v) with \S{u,v)\ < t has a safe path. 

Proof. After every iteration of the algorithm all of the tight constraints are added to H, which 



together with Lemma 3.5 implies that the algorithm never gets stuck. Thus it will run until 
every pair u,v with \S{u,v)\ < t has a safe path. It just remains to show that the total num- 
ber of edges returned is at most O(t^) x OPT. To see this, note that every edge in H cor- 
responds to a tight constraint in the feasible dual solution we constructed, so ii e £ H then 



Eu,v:ees{u,v) EseS{u,v):e€5u.{S) vT = 1- ^hus we have that 

eeH eeH {u,v)eD:eeSiu,v) SG5(w,i;):ee<5„„(S) 

= E E E ^r 

= E E I^n5™(5)|yr 

(M,i))eD 5g5{m,i)) 

<t' E E ^r 

(«,i')eD seS[u,v) 
<fx OPT 



where the last inequahty is by duahty, and the next to last inequality is because \H n 5uv{S)\ < 
{\^u.{S)\^^ < £2 (since {u,v) G D). U 

Lemma 3.7. The primal-dual algorithm takes at most 0{n^) time. 



Proof. The primal-dual algorithm adds at least one new edge per iteration, so there can be at most 
n^ iterations. In each iteration we have to figure out the current value of every dual constraint and 
the number of active variables in each constraint, which together will imply what the next tight 
constraint is and how much to raise the y variables. We then need to raise the active variables by 
that amount and merge moats. Note that for every demand there are at most two active moats, 
so the total number of active variables is at most 0{n'^). Thus each iteration can be done in time 
0{n'^), where the dominant term is the time taken to calculate the value of each dual constraint. 
So the total time is 0{n^), where there are extra poy logarithmic terms due to data structure 
overhead. D 

Now we can trade this off with the random sampling solution for large safe sets to get an actual 
approximation algorithm: 

Theorem 3.8. There is a 0{n'^/^) approximation algorithm for the Constrained Connectivity- 
Sum problem on A'„, that runs in time 0{n^). 

Proof. Our algorithm first runs the primal-dual algorithm with t = 0((n log n)^''^). By Theo- 
rem ^.6| , this returns a graph H with at most 0((n log n)^''^) x OPT edges in which there is a safe 
path for every {u,v) with [^(tt, u)| < 0((?ilogn)^'^). We then use the random star sampling of 
Lemma 3^ with s = r2((n log n)"^'^) and thus p = 0((logn)^'^/n^'^). By Lemma 3^ this satisfies 



the rest of the demands (the pairs {u,v) with [^(n, v)! > s) with high probability, and the number 
of edges added is with high probability at most 0{pn^) = O {{n log n)'^'^n) = 0{{nlogn)'^'^) x OPT 
as desired. 

The time bound follows from Lemma ^J together with the trivial fact that star sampling can 
be done in 0{n^) time. D 

3.3 Simulations 

In this section we discuss some the results of simulations using our algorithms. While we believe 
that the main contribution of this work is theoretical, it is interesting that the algorithms are fast 

10 



AS 


Name 


Number of PoPs 


Number of links 


1221 


Telstra 


44 


88 


1239 


Sprint 


52 


168 


2914 


NTT 


70 


222 


3257 


TINET 


41 


174 


3356 


Level 3 


63 


570 



Table 1: ISP Topologies Used 



enough to be practical and give solutions that are in practice far superior to the worst case 0{v?'^) 
bound. 

We implemented both the LP rounding and the primal-dual algorithm for the iBGP-SuM 
problem. However, the rounding algorithm turned out to be impractical, mainly due to memory 
constraints. Recall that in the compact version of the flow LP there is a flow variable // ^ for 
every pair (tt,f) and {x,y). This variable denotes the amount of flow from u to v along the edge 
{u,v} for the demand {x,y). There are also 0(n^) capacity constraints. So on even a modest 
size AS topology, say one with 50 nodes, the linear program has over six million variables and 
constraints. Running on a commodity desktop, the memory used by CPLEX merely to create and 
store this LP results in an extremely large running time, even without attempting to solve it. Our 
primal-dual algorithm, on the other hand, only needs to keep track of 0{n'^) active dual variables 
and the current values of the 0{n'^) dual constraints. So we can actually run this algorithm on 
reasonably sized graphs. 

One change that we make from the theoretical algorithm is the tradeoff with random sampling. 
In the theoretical analysis we are only able to get a nontrivial approximation bound by using the 
primal-dual algorithm to handle small safe sets and random sampling to handle large safe sets, but 
experimentation revealed that the simpler algorithm of using the primal-dual technique to handle 
all safe sets was sufficient. 

To test out this algorithm we ran it on five real-world ISP topologies with link weights given 
by the Rocketfuel project |T^. Our implementation is still relatively slow, so we consider Point- 
of-Presence level topologies rather than router-level topologies. We feel that this is not unrealistic, 
though, since in practice the routers at a given PoP would probably just use a single router at that 
PoP as a route reflector |15, Section 3.1]. The topologies we used are summarized in Table ||. 

We compare the number of iBGP sessions used by a full mesh to the number of edges in the 
overlay produced by the primal-dual algorithm. We assume (conservatively) that all the nodes in 
the topology are external BGP routers. Our results are shown in Table and in Figure ffl. These 
results show that the primal-dual algorithm gives graphs that are much smaller than the default 
full mesh. Of course, we do not model additional requirements such as fault-tolerance and stability, 
but the massive gap suggests that even if adding extra requirements results in doubling or tripling 
the size of the overlay we will still see a large benefit over the full mesh. Moreover, these results 
show that the 0(n^'^) upper bound on the approximation ratio that we proved in Section 3^ is 
extremely pessimistic. On these actual topologies the primal-dual algorithm gives results that are 
only slightly larger than n (the worst case is for Level 3, in which the primal-dual algorithm gives 
an overlay with about 2.75 x n edges). Since n — 1 is an obvious lower bound (the overlays clearly 
must be connected), this means that in practice our algorithm gives a 0(l)-approximation. 
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AS 


full-mesh 


Primal-Dual 


Fraction of full-mesh 


1221 


946 


44 


4.65% 


1239 


1326 


83 


6.26% 


2914 


2415 


109 


4.5% 


3257 


820 


75 


9.15% 


3356 


1953 


173 


8.86% 



Table 2: Primal-Dual vs. full-mesh 



AS1221 AS1239 AS2914 AS3257 AS3356 



Figure 1: Primal-Dual vs. full-mesh 

4 Complexity of iBGP-SuM and iBGP-Degree 

In this section we will show that the iBGP problems are r2(log ?i)-hard to approximate by a reduction 
from Hitting Set (or equivalently from Set Cover). This is a much weaker hardness than the 
2^°s n hardness that we prove for the general Constrained Connectivity problems in Section |^, 
but the iBGP problems are much more restrictive. We note that this Vt{\ogn) hardness is easy to 
prove for Constrained Connectivity on Kn', the main difficulty is constructing a metric so that the 
geometrically defined safe sets of iBGP have the structure that we want. 

We begin by giving a useful gadget that encodes a Hitting Set instance as an instance of an 
iBGP problem in which all we care about is minimizing the degree of a particular vertex. We will 
then show how a simple combination of these gadgets can be used to prove that iBGP-Degree is 
hard to approximate, and how more complicated modifications to the gadget can be used to prove 
that iBGP-SuM is hard to approximate. 

Suppose we are given an instance of hitting set with elements 1,2,... ,n (note that we are 
overloading these as both integers and elements) and sets Ti, r2, . . . , T^. Our gadget will contain a 
node X whose degree we want to minimize, a node Oj for all elements i G {1, . . . , n}, and a node hx 
for each set Tj in the instance. We will also have four extra "dummy" nodes: z, y, ti, and h. The 
following table specifies some of the distances between points. All other distances are the shortest 
path graph distances given these. Let M be some large value (e.g. 20), and let e be some extremely 
small value larger than 0. 
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X 


z 
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Cli 


6t, 


u 


h 


X 

z 


M 


M 


1.5 




1 + ie 


M + 1.4 + je 


2 




y 

U 


M + 1.4 + je 


1.5 

1 + ie 

2 




1 + ii 


+ j)e(ifiEr,) 
1.1 


I + {i + j)e {[fi e T,) 


1.1 


1+je 


h 












1+je 







It is easy to check that this is indeed a metric space. Inforniany, we want to claim that any 
solution to the iBGP problems on this instance must have an edge from x to Oj nodes such that 
the associated elements i form a hitting set. Here y, u, and h are nodes that force the safe sets into 
the form we want, and z is used to guarantee the existence of a small solution. 

Lemma 4.1. Let E he any feasible solution to the above iBGP instance. For every vertex Btj there 
is either an edge {x, bx } & E or an edge {x, Oj} G E where i G Tj 

Proof. We will prove this by analyzing S{x,bT). If we can show that S{x,bT) = {x,bT} U {oj : 
i G Tj} then we will be finished. Note that d{x,bT) = M + 1.4 + je, so the vertices outside 
B{x,d{x,bT )) are y (distance M + 1.5 from x), u (distance M + 2 from x), h (distance at least 
M + 2.4 from x), and bx,. with k > j (distance M + 1.4 + fee from x). The vertices inside the ball 
are x, z, all a^ nodes, and 6^^. with k < j. 

Obviously x and fty. are in S{x, hx) by definition. Let Oj be a vertex with i € Tj. It is easy to 
verify that Oj is closer to b^ than to any vertex outside of the ball: it has distance 1 + {i + j)e from 
6t ) distance 1 + (i + k)e from b^^. with k > j, distance 2.5 + ie from y, distance 1.1 from u, and 
distance greater than 2 from h. So aj G S{x,bT) as required. On the other hand, suppose i ^Tj. 
Then d{ai,bTj) > 2, while d{ai,u) = 1.1, so Oj S{x,bTj). Similarly, any vertex 6^^, with k < j is 
closer to h (distance 1 + je) than to bx (distance at least 2) and z is closer to y (distance 1.5) than 
to bxj (distance at least 2). Thus S{x,bTj) = {x,hTj} U {oj : i G Tj}, so E must include an edge 
from X to either 6^. or an Oj with i gTj. D 

We now want to use this gadget to prove logarithmic hardness for iBGP-SuM. We will use the 
basic gadget but will duplicate x. So there will be i copies of x, which we will call xi,X2, . . . ,xe, 
and their distances are defined to be d{xi, z) = M + ie and d{xi, bx^) = M + 1.4 + (i + j)e with 
all other distances defined to be the shortest path. Note that all we did was modify the gadget to 
"break ties" between the Xj's. Also note that the shortest path between Xj and Xj is through z, for 
a total distance of 2M + (i + j)e. As before, let H be the smallest hitting set. 

Lemma 4.2. Any feasible iBGP-SuM solution has at least i\H\ edges. 



Proof. It is easy to see that Lemma LI still holds, i.e. that S{xi,bT) = {xj,6t.} U {a^ : k G Tj}. 
Intuitively this is because all other x nodes are outside of B{xi,d{xibT)) and all distances from x to 
the gadget are the same as before except with an additional ie. This implies that the number of a^ 
and 6tj nodes adjacent to Xj in any feasible solution must be at least \H\, since if there were fewer 
such adjacent nodes it would imply the existence of a smaller hitting set (any bx nodes adjacent to 
Xj could just be covered using an arbitrary element in Tj at the same cost as using the set itself). 
Thus the total number of edges must be at least (.\H\. D 
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Lemma 4.3. There is a feasible iBGP-SuM solution with at most i\H\ + £ + {m + n + 4)^ edges. 

Proof. The solution is simple: create a clique on the ai,bT.,z,u,y,h nodes (which obviously has 
size at most (m + n + 4)^), include an edge from every Xi to z (another £ edges) and include an 
edge from every Xi to every a^ with k € H (another i\H\ edges). Obviously there are the right 
number of edges in this solution, so it remains to prove that it is feasible. To show this we partition 
the pairs into types and show that every pair in every type is satisfied. The types are 1) Xi — hxj , 
2) Xi — h, 3) Xi — Xj, 4) Xi — a (where a is any other node in the gadget not included in a previous 
type), and 5) a — Xi This is clearly an exhaustive partitioning, so we can just demonstrate that 
each type is satisfied in turn. 

For the first type we already showed that S'(xj,6tj) includes all Ofc where k £ Tj. Since H is 
a valid hitting set Xi must be adjacent to one such a^, which in turn is adjacent to bx , forming a 
valid safe path. For the second type the only vertices outside B{xi, d{xi, h)) are Xj with j ^ i, and 
z is closer to h than to any such Xj. Thus z £ S{xi, h) so the path Xi — z — h \t\. our solution is a 
valid safe path. For the third type the vertices outside B{xi, d{xi, Xj)) are {xk : k > j and k / i}. 
Because of the tie-breaking we introduced, d{z, Xj) = M + je while d{z, x^) = M + ke > M + je, 
and thus z G S{xi, Xj) and so the path Xi — z — Xj in our solution is a valid safe path. The fourth 
type is even simpler, since a must be either z, u, y, or an a^ node and the shortest path from Xi to 
any of these is through z. So z E S{xi, a) and Xi — z — a is a valid safe path. Finally, for the last 
type the vertices outside B{a, d{a, Xi)) are {xk : k > i}, and z is closer to Xi (distance M + ie) than 
any such x^ (distance M + ke). So again z E S{a, Xi) and thus a — z — Xj is a valid safe path. D 

Theorem 4.4. It is NP-hard to approximate iBGP-SuM to a factor better than Q{logN), where 
N is the number of vertices in the metric. 

Proof. It is known that there is some /3 for which it is NP-hard to distinguish hitting set instances 
with a hitting set of size at most /? from instances in which all hitting sets have size at least 
/31nm. In the first case we know from Lemma ^^ that there is a valid iBGP-SuM solution of 
size at most ip + i + {m + n + 4)^. In the second cast we know from Lemma [4.2| that any valid 
iBGP-SuM solution must have size at least £/31nm. If we set £ = {m + n + 4)^ this gives a gap of 
£(3 lnm/£{(3 + 2) = /3 lnm//5 + 2 = r2(log m). The number of vertices A'^ in the iBGP-SuM instance 
is 0{{rn + n + 4)^) so logm = r2(log A^), and thus we get r2(logn) hardness of approximation. D 

It is also fairly simple to modify the basic gadget to prove the same logarithmic hardness for 
iBGP-Degree. We do this by duplicating everything other than x, instead of duplicating x. This 
will force x to have the largest degree. 

Theorem 4.5. It is NP-hard to approximate iBGP-Degree to a factor better than Q.{\ogN), 
where N is the number of vertices in the metric. 

Proof. We will use multiple copies of the above gadget. Let a be some large integer that we will 
define later. We create a copies of the gadget but identify all of the x vertices, so there is still a 
unique x but for all other nodes v in the original there are now a copies i)^, v^, . . . , ?;". The distance 
between two nodes in the same copy is exactly as in the original gadget, and the distance between 
two nodes in different copies (say s* and P) is the distance implied by forcing them to go through 
X (i.e. d{s'^,P) = d{s,x) + d{x,t)). Call this metric M = {V,d). Every vertex in copy i is closer 
to the rest of copy i than to any vertex in copy j, so Lemma |4.1| holds for every copy. Thus if the 
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smallest hitting set is H the degree of x in any feasible solution to iBGP-Degree on M must be 
at least a|i:f|. 

Conversely, we claim that there is a feasible solution to iBGP-Degree in which every vertex 
has degree at most a(|-ff| + !)• Consider the solution in which x is adjacent to z^ and to a^ for all 
j G [q\ and i £ H, and all nodes (other than x) in copy j are adjacent to all other nodes (other 
than x) in copy j for all j G [a]. By the above analysis of S{x,¥rp) we know that this solution 
satisfies these safe sets (via the safe path x — Ui — hx- where i S ff is an element in Tj). It also 
obviously satisfies pairs not involving x in the same copy, since there is an edge directly between 
them. It remains to show that pairs involving x are satisfied and that pairs involving two different 
copies are satisfied. 

For the first of these we will show that z is in all safe sets of the form S{x, w^) where w is not 
a b node. This is easy to verify exhaustively. It is also true that z is in all safe sets of the form 
S{w^, x) even when w is a b node, since all vertices outside the ball B{w^, d{w^,x)) are in different 
copies and the shortest path from z to any node in a different copy must go through x. Thus the 
path X — z — w'^ in our solution satisfies both of these safe sets. Finally, it is again easy to verify 
that pairs in different copies are also satisfied. 

Now by setting a appropriately we are finished. Each copy has n+m+A nodes, so in the feasible 
solution we have constructed the degree of any node other than x is at most (n + ttt, + 4)^ + 1. If 
we set a to some value larger than this, say {n + m + 4)^, we know that the degree of x has to be 
at least (n + rre + 4)^|i7|. It is known that it is hard to distinguish between hitting set instances 
with hitting sets of size at most /3 and those in which every hitting set has size at least /3 In m for 
some value /3. Suppose that we are in the first case, where there is a hitting set of size at most f3. 
Then we constructed a feasible solution to the iBGP-Degree problem with maximum degree at 
most (n + 771 + 4)'^(/3 + 1). In the second case, where every hitting set has size at least I3lnm, we 
showed that the degree of x (and thus the maximum degree) must be at least (n + m + 4)^/3 In n. 
This gives a gap of /31nm/(/5 + 1), which is clearly r2(logm). Since the number of vertices in the 
iBGP-Degree instance is polynomial in m, this implies rj(log A^)-hardness. D 

5 Constrained Connectivity 

In this section we consider the hardness of the Constrained Connectivity problems and the inte- 
grality gaps of the natural LP relaxations. 

5.1 Hardness 

We now show that the Constrained Connectivity-Sum and Constrained Connectivity- 
Degree problems are both hard to approximate to better than 2'°s " for any constant e > 0. We 
do this via a reduction from Min-Rep, a problem that is known to be impossible to approximate to 
better than 2'°s'"'" unless NP C DTIME(nP°'yi°g(")) ||]. An instance of Min-Rep is a bipartite 
graph G = {U, V, E) in which U is partitioned into groups Ui,U2, ■ ■ ■ , Um and V is partitioned into 
group Vi,V2, ■ ■ ■ , Vm- There is a super-edge between Ui and Vj if there is an edge {u, v} £ E such 
that u £ Ui and v £ Vj. The goal is to find a minimum set 5" of vertices such that for all super-edges 
{Ui, Vj} there is some edge {u, v} £ E with u G Ui and v £ Vj and u,v £ S. Vertices from a group 
that are in S are called the representatives of the group. It is easy to prove by a reduction from 
Label Cover that Min-Rep is hard to approximate to better than 2'°^ ""^ and in particular it 
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Figure 2: Basic hardness construction. 

is hard to distinguish the case when 2m vertices are enough (one from each part in the partition 
for each side of the graph) from the case when 2m x 2^°^ " vertices are necessary |13]. 

Given an instance of Min-Rep, we want to convert it into an instance of Constrained 
Connectivity-Sum. We will create a graph with five types of vertices: x*- for j G [m] and 
i £ [d]; U; V; y*- for j G [m] and i £ [d]; and z. Here the x nodes represent d copies of the groups 
of U and the y nodes represent d copies of the groups of V, where d is some parameter that we 
will define later. ^; is a dummy node that we will use to connect pairs that are not crucial to the 
analysis. Given this vertex set, there will be four types of edges: {x*-,n} for all j € [m] and i G [d] 
and u £ Uj] {u,v} for all edges {u,v} in the original Min-Rep instance; {v,y^A for all j € [m] and 
i G [d] and v £ Vj] and {w, z} for all vertices w. 

This construction is shown in Figure ^, except in the actual construction there are d copies of 
each node in the top and bottom layer and there is a 2 node that is adjacent to all other nodes. 
In Figure ^ the middle two layers are identical to the original Min-Rep problem, and the large 
ellipses represent the groups. In the figure we have simply added a new vertex for each group, and 
in the construction there are d such new vertices per group as well as a z vertex. 

Now that we have described the constrained connectivity graph, we need to define the safe sets. 
There are two types of safe sets: if in the original instance there is a super-edge between Ui and Vj 
then 5(xf , y'j) = S{y^,x^) = {xf , yj^} U C/j U V,- for all k £ [d]. All other safe sets consist of the two 
endpoints and z. Let cmr denote the number of super-edges in the Min-Rep instance, let umr 
denote the number of vertices. 

The following theorem shows that this reduction works. The intuition behind it is that a safe 
path between an x node and a y node corresponds to using the intermediate nodes in the path as 
the representatives of the groups corresponding to the x and y nodes, so minimizing the number of 
labels is like minimizing the number of edges incident on x and y nodes. 

Theorem 5.1. The original Min-Rep instance has a solution of size at most K if and only if there 
is a solution to the reduced Constrained Connectivity problem of size at most K d+CM R+2md+nM r- 

Proof. We first prove the only if direction by showing that if there is a Min-Rep solution of size K 
then there is a Constrained Connectivity solution of size Kd + eMR + 2md + umr- Let OPTmr be 
the set of vertices in a Min-Rep solution of size K. Our constrained connectivity solution includes 
all edges of type 4, i.e. we include a star centered at z. For each i £ [d\ and j £ [m] we also 
include all edges of the form {rE*,M} where u £ Ujr\ OPTmr and all edges of the form {y*,^} 
where v £Vjr\ OPTmr- Finally, for each super-edge in the Min-Rep instance we include the edge 
between the pair from OPTmr that satisfies it (if there is more than one such pair we choose one 
arbitrarily). The star clearly has 2md + umr edges, there are Kd edges from x and y nodes to 
nodes in OPTmr, and there are clearly cmr of the third type of edges, so the total number of edges 
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in our solution is Kd + e^fR + 2md + umr as required. To prove that it is a valid solution, we first 
note that for all pairs except those of the form {x^,y^) or {y^,x\) where {C/j, V^} is a super-edge are 
satisfied via the star centered at z. For pairs (xf,y*^) and {y^,x^) with an associated super-edge, 
since OPTmr is a valid solution there must be some u £ UiCi OPTmr and v £ Vj Ci OPTmr that 
have an edge between them, and the above solution would include that edge as well as the edge 
from x^ to u and from y^ to v, thus forming a safe path of length 3. 

For the if direction we need to show that if there is a Constrained Connectivity solution of 
size Kd + cmr + ^md + umr then there is a Min-Rep solution of size at most K. Let OPTqc be 
a constrained connectivity solution with Kd + cmr + "^rnd + umr edges. Since S^ui, z) = {w, z} 
for all vertices w, 2md + umr of those edges must be a star centered at z, so only Kd + smr 
edges are between other vertices. Obviously there need to be at least cmr edges between U and 
V, since otherwise it would be impossible to satisfy all of the demands between x and y nodes 
corresponding to super-edges. Thus there are at most Kd edges incident on either x or y nodes. 
We can partition these edges into d parts, where the edges in the ith part are those incident on an 
X* or y* node. So there must be one part of size at most K; let i be this part. But since this is a 
valid constrained connectivity solution there is a safe path between x*- and y^ for all j, i such that 
there is a super-edge between Uj and y£, and thus the nodes in U and V that are incident to edges 
in this ith part must form a valid Min-Rep solution of size at most K. D 

We can now set d = nj^^j^, which gives the following theorem: 

Theorem 5.2. Constrained Connectivity-Sum cannot be approximated better than 2'°s '" 
for anye>0 unless NP C L'T/ME(nP°^y'°s(")) 

Proof. We know that it is hard to distinguish between an instance of Min-Rep with a solution 
of size at most 2m and an instance in which every solution is of size at least 2m x 2 ^^ ". 
Let d = nj^^. Then Theorem 5.1 implies that it is hard to distinguish between an instance of 
constrained connectivity with a solution of size at most 2m,n\^^+eMR+'^'nm\^^+nMR = 0{mn\j^ 
and an instance in which every solution has size at least 2?n,2^°§ ^''"^n\j^+eMR+'^'mn\j^+nMR = 
(mn|,j j:j2'°s ^^MRy This gives an inapproximability gap of il(2^°s ^riMay Since d = n|^^ the 
number of vertices n in our constrained connectivity instances is umr + 2?7in|j^ < 0(n|^^), and 
thus Q(2i°g'"'"A/fl) = 2^(i°g'"'"). To get this to 2'°s'"'" we can simply use a smaller e'. D 

We win now prove that CONSTRAINED Connectivity-Degree has the same hardness of ap- 
proximation of Constrained Connectivity-Sum. The reduction from Min-Rep to the degree 
problem is basically the same as the reduction to the sum problem, except there are also d^ ad- 
ditional copies of the gadget other than the x and y nodes. More formally, now the nodes are x*- 
and y*- for j E [m] and i G [d], u*-' for u G [/ and i,j S [d], f*-' for v £V and i,j G [d], and z^^ for 
i,j € [c?]. Now intuitively each copy ij of the original f7, V ^ and z is hooked together exactly like in 
the original construction, and is hooked up to the nodes {x^jytgrml and {y;[}fee[m] exactly as if they 
were one copy of the outer x and y nodes of the original construction. 

More formally, the edges are the same as before, except now each of the d'^ new copies is 
independent. In other words, there is an edge between x* and u for all i,k £ [d] and j € [m] 
and li G Uj, an edge between y* and v^'^ for all i,k £ [d] and j G [m] and v £ Vj, an edge between 
u*-' and v^^ for all i,j E [d] and edges {u,v} in the original Min-Rep instance, an edge between 
Xj- and z^^ for all i,k £ [d] and j G [m], an edge between y*- and z^^ for all i,k £ [d] and j E [m], 
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an edge between u^^ and z*'^ for all i,k G [d] and u ^U, and an edge between v^^ and z^^ for all 
i,k & [d] and v £ V. Similarly, the safe sets are as before but defined by the copies. That is, 
S{x^,yj) = S{yp x^) = {x^, y^} U Uf U V--. All safe sets between nodes in the same copy ij are the 
two endpoints together with z*-', and the safe set of vertices in different copies is just all vertices. 

Theorem 5.3. Constrained Connectivity-Degree cannot he approximated better than 2'°^ ' "■ 
for any constant e > unless NP C DTIME{nP°^y^°>^'^'''^) 

Proof. Every vertex in [/*■' or V^^ can have degree at most nj\,f_R+l, since there are only umr—^ other 
nodes in its copy, and it can in addition be adjacent to z*-' and and the node x\ or y;^ corresponding 
to its group f/fc and Vj- respectively. Every node z*-' has degree at most umr + 27ti, < Sumr, since 
it can be adjacent to umr nodes in W^ and V^^ as well as m nodes from X* and m nodes from 
Y^ . On the other hand, every xj, node and every y^ node must be adjacent to at least 1 W^ or V^^ 
node respectively for all d possibilities for i. So every such x or y nodes has degree at least d, so if 
we set d = Shmr we know that the node with maximum degree must be an x or a y node. 

Recall that it is hard to distinguish Min-Rep instances with solutions of size at most 2m from 
those in which all solutions have size at least 2?7i2 ^^ "a/h. Suppose that there is a solution of size 
2m, i.e. there is a solution with one representative from each group. Then there is a solution to the 
corresponding CONSTRAINED Connectivity-Degree instance with max degree at most d: every 
x*- and y*- is connected to its corresponding representative in each of the d copies corresponding to 
it as well as to the z node for that copy, and in each copy ij we include all edges between U^^ and 
V^^ and all edges between those nodes and z*-'. It is easy to see that this is a valid solution: by 
the analysis of Theorem ^^ we know that it is valid inside of each copy, and to get between copies 
nodes s*-' and t^^ can use the safe path s^^ — z^^ ~ ^h ~ ^^^ ~ y\i ~ ^^^ ~ ^^ ^ where s and t are 
arbitrary nodes in the copy ij, and h is an arbitrary index in \m\. 

On the other hand, suppose that every solution to the Min-Rep instance has size at least 
2m2^^ "A/ii_ Then as in the analysis of Theorem 5.1 for every copy ij there must be at least 



2m2^°s "^MR edges that are either between X* and U"^^ or between Y^ and V^K Thus there are 
at least d'^27n2^^ "^^^ such edges. Since there are only 2md vertices in X U y, at least one such 
vertex must have degree at least 2^°^ "-^'^d. 

This shows that it is hard to approximate CONSTRAINED Connectivity-Degree to better 
than 2 ^s "^^'^d/d = 2^^ ^mb._ Since the number of vertices n in our instances is polynomial in 
Umr, this means that it is hard to approximate to better than 2 ' ^^ "^ We can then get this to 
2log n |-|y j^g|. yg^j^g a, smaller e'. D 

5.2 Integrality Gap for Constrained Connectivity 

We claim that the integrality gap of the flow and cut LP relaxations is large for both Constrained 
Connectivity-Sum and Constrained Connectivity-Degree. The intuition is that we use 
a Min-Rep instance in which the edges between each group form a matching (allowing the LP 
to cheat by breaking up the flow) but many representatives are needed for a valid solution. This 
instance is then changed into a Constrained Connectivity problem as in the hardness reduction. 
These results are in many ways similar to the Q,[n^'^~'') integrality gap for Min-Rep recently proved 
by Charikar et al. Q, but the reduction to Constrained Connectivity adds extra complications. 

The instances for which we will show a large integrality gap are derived from instances of the 
Unique Games problem, in which we are given a graph G = {V, E) and a set of permutations tt^^ on 
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some alphabet S (one constraint for every edge {u, v) G E) and are asked to assign a value Xu from 
S to each vertex u so as to satisfy the maximum number of constraints of the form TTuvixu) = x^. 
This problem was first considered by Khot |]l^], who conjectured that it was NP-hard to distinguish 
instances on which 1 — 5 fraction of the constraints can be satisfied from instances on which at most 
e fraction of the constraints can be satisfied (for sufficiently small e and 5). For our purposes we 
will consider a minimization version of the Unique Games problem in which we can assign multiple 
labels to vertices and the goal is to assign as few labels as possible so that for every edge (u, v) 
there is some label Xu assigned to u with iTuvixu) assigned to v. We first show that there exist 
instances that require many labels: 

Lemma 5.4. For any constant e < 1, there are instances of Unique Games with alphabet size 
O I n 1-3' I and 0(n^) edges that require en^ labels for any valid solution. 

Proof. We will prove this by the probabilistic method, i.e. we will analyze a random Unique Games 
instance with the given parameters and show that the probability that it has a solution of size at 
most 0{n'^) is strictly less than 1. This then implies the existence of such an instance. For our 
random instance, the underlying graph will be Kn, so there is a permutation constraint on every 
pair of vertices. Let A: = |S| be the size of the alphabet (we will later set this to the value claimed 
in the lemma, but for now we will leave it as a parameter) . For each pair of vertices we will then 
select a permutation uniformly at random from Sk. 

Now consider some fixed set S of an labels (so the average number of labels per node is a). 
What is the probability that S is a valid solution? By Markov's inequality, we know that at most 
n/2 vertices have more than 2a labels, so there are at least n/2 vertices with at most 2a labels. 
Call these vertices light, and call an edge light if both of its endpoints are light. Let {u, v} be a light 
edge. We claim that the probability that S satisfies {u,v} is at most -^. To see this, let ^ G S 
be one of the labels assigned to u by S. Since the permutation for {u, v} was chosen uniformly at 
random, the probability that i is matched to one of the labels assigned to i^ by 5 is at most 2a/k. 
Now we can do a union bound over all such labels i, of which there are at most 2a, to get that the 
probability that edge {u,v} is satisfied by S is at most -^. Since the permutations for each edge 
are chosen independently, the event that edge e is satisfied is independent of the event that edge e' 
is satisfied for all e' / e. Thus the probability that S satisfies every edge is at most the product of 
the probabilities that it satisfies each fixed edge, i.e. the probability that S is a valid solution is at 



most ( -^ J < ( -J- ) (for sufficiently large n). 

By the trivial union bound, we know that the probability that there is some valid solution of 
size an for our random instance is at most the sum over all possible solutions of size an of the 
probability that the solution is valid, which by the above analysis we know is at most \{S : \S\ = 

an}\ X (^^) • So we will now bound A^ = |{5 : jSI = an}\, which is easy to do by a simple 

counting argument. In particular, it is obvious that N = ( "), since there are exactly kn total 
labels and we are just choosing an of them. Now standard bounds for binomial coefficients imply 
that A^ < {-^) = (-^) . Combining this with the previous analysis and setting a = en, we get 
that the probability that there is some valid solution of size an is at most 
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2(l + e) 

The final inequality is true as long as n is sufficiently large. If we set k = n i-s^ then this 
expression is less than 1. Since this is the probability that the random Unique Gaines instance we 
selected has a satisfying solution of size an^ this implies that for the given parameters there is some 
unique games instance that requires more than an = ev? labels. D 

Now that we have found a Unique Games instance that requires many labels we would like to 
use it to construct a Constrained Connectivity-Sum instance on which the flow LP has large 
integrality gap. We will basically use the same transformation that we used in the reduction of 
Min-Rep to Constrained Connectivity-Sum. Let Vjjg be the vertex set of the above Unique 
Games instance, and let S be the alphabet. Then our Constrained Connectivity-Sum instance 
will have vertex set V equal to the disjoint union of Vug ^ [d], VuG ^ 5], and a special node z, where 
d is a duplication parameter that we will set later. For ease of notation, we will let Xi denote the i'th 
copy of vertex x in Vjjg x [d], i.e. Xj = (x, i). For all x G Vjjg and i £ [d] there is an edge from Xi to 
every vertex in x x S. For every x,y £ Vjjg and a, /3 G S there is an edge between (x, a) and (y, /?) 
if and only if assigning a to x and /3 to y is sufficient to satisfy the {x, y} edge in the Unique Games 
instance (i.e. the permutation for that edge matches them up). There is also an edge between every 
vertex and z. For x,y £ Vjjg and i € [d\ we set S{xi,yi) = S{yi,Xi) = {x,y} U (x x S) U (y x S), 
and we set all other safe sets to the two endpoints and z. 

Lemma 5.5. The value of the flow LP on the above Constrained Connectivity-Sum instance 
is at most 2d\VuG\ + MVug\ + ('^^''')- 

Proof. We prove this by constructing an LP solution of the required size. We first set the capacity 
of every edge incident on z to 1, for a total cost of iSHVf/cl + d|V[/G|. This is enough capacity to 
satisfy all pairs other than those of the form (xj,yj) or (?/j,Xj), since for any other pair z is in the 
safe set so we can send one unit of flow on the edge from one endpoint to z and then one unit of 
flow on the edge from z to the other endpoint. 

Now we set the capacity of every other edge to 1/|S|. Since the number of other edges is 
(i|Vc/G||5^| + ( 2*^)1^1 this costs us (ilVf/d + ( 2^^ ) more, which when added to the cost of the 
edges to z gives us the claimed total LP value. So we just need to prove that this is enough capacity 
to satisfy demands between Xi and j/j for all x,y £ V and i £ [d]. But this is easy to see: Xi can 
send 1/1 S| flow to every node in x x S (for a total flow of 1), and each of these nodes will forward 
its incoming flow to its neighbor in y x S. Since this is a Unique Games instance this neighbor will 
be unique, and each node in y x E will have exactly 1/|S| incoming flow, which it can then forward 
along its edge to yi. Thus we have enough capacity to send one unit of flow from Xj to y^. And yi 
can send flow to Xj the same way, just in reverse. D 
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Lemma 5.6. Any integral solution to the above Constrained Connectivity-Sum instance must 
have size at least {dxOPTjjG)+\ 2*^') +^11^(7^1 + |S||Vc/g| where OPTjjg is the minimum number 
of labels needed to satisfy the original Unique Games instance. 

Proof. The safe set of any node and z is only that node and z, so ah edges incident to z need to 
be present in any integral solution for a cost of dlVf/cl + ISHV^/gI. Furthermore, for every pair 
u,v £ Vug at least one edge must be present from (n x S) to {v x S) since if no such edge existed 
there would be no way of connecting Ui and Vi through S{ui,Vi) for any i G [d]. This adds (' ^^') 
to the total cost, so now we just need to prove that there must be at least dOPTuc edges between 
[Vug X [d]) and {Vug x S). 

To show this, we will consider some arbitrary integral solution and partition the edges between 
(Vug X [d]) and {Vug x S) into d parts where the ith part consists of those edges incident on nodes 
{xi : X £ Vug}- If every part has size at least OPTug then we are finished. To prove that this is 
indeed the case, we will prove that for every part, the endpoints that are in Vug x S actually form 
a valid solution to the Unique Games instance. So consider the ith part of the partition. Suppose 
that the associated label assignment does not form a valid solution to the Unique Games instance. 
Then there is some pair u^v £V such that none of the labels assigned to u and none of the labels 
assigned to v are matched to each other in the permutation corresponding to edge {u, u}. But 
this clearly implies that there is no safe path from Ui to fj, as any such path must be of length 
3 and pass through a label for u and a label for v that are matched to each in the permutation 
corresponding to edge {u, v}. This is a contradiction since the integral solution must be a valid 
solution. n 

Theorem 5.7. The flow LP for Constrained Connectivity- Sum has an integrality gap of 
J7(n3~'^) for any constant e > 0. 



Proof. We will use the Unique Games instance of Lemma p^ in the above reduction. Lemma 5.5 



implies that the flow LP has value at most 0{d\VuG\ + \Vug\ ^"^^ ) and Lemma |5^ implies that any 

3-6 2(1 + 6) 

integral solution has size at least ^{de\VuG\ ) + |Vt/Gh"^0- If we let d = \Tj\ = \Vug\ ^"^' then 
this gives us an integrality gap of 

4-46 \ 
1 l-3e 



nh-^YH^]=n{e\VuG\). 

V |Vf/Gp 



3-e 

T^^37 



It is easy to see that the number of nodes n in our reduction equals (i|V[/G| + |S||V[/g| + 1 which in 

3-e l-3g 

this case is 0(|Vf/G| i-^-^). Thus the integrality gap is Q{n 3-^ ), which is sufficient since we can set 
e to be arbitrarily small. D 

We can modify this construction to show a polynomial integrality gap for the flow LP for 
Constrained Connectivity-Degree also. We wih need Unique Games instances with the 



same parameters as in Lemma |5.4| but on the complete bipartite graph rather than the complete 



graph. It is easy to see that Lemma 5.4 can be modified to prove the existence of these instance. 
Now the modification is basically the same as the modification we made to show hardness: we just 
make d'^ copies of the inner Unique Games instance and connect them up to the d copies of the 
outer Xi and yi nodes in the obvious way. 

Theorem 5.8. The flow LP for Constrained Connectivity-Degree has an integrality gap of 
r2(n9~'^) for any constant e > 0. 
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Proof. The maximum degree of any node other than the outer d copies of the x and y nodes is at 



3- 



most 2|V{/g|1"^S so if we set d equal to that value we know that the maximum degree must be 



achieved by some copy of an Xj or yj. By splitting up the flow equally as in the proof of Lemma 5.5 
we know that there is an LP solution in which the maximum degree is at most (i|S|/|S| + d = 2(i 
(where the extra d factor is due to being adjacent to all associated z copies). On the other hand, 
we know that any valid integer solution must use at least e|V[/Gp edges incident on copies of Xi 
or yi nodes for each of the d"^ instances. Thus there are at least d^elVf/cp edges incident on these 
nodes in total, and since there are d|V(/G| such nodes there must be at least one with degree at 
least e(i|V[/G|. Thus the integrality gap is at least €d\VuG\/d = e\Vi/G\- The total number of nodes 
in our CONSTRAINED Connectivity-Degree instance is 0(|V{/g||S|" ) = 0{\Vug\^'^^), so this 

l-3g 

means the integrality gap is Q{n^-^<'). By setting e small enough this gives us the claimed gap of 

6 Hierarchical and Symmetric Safe Sets 

While the constraint the G = Kn gave us some extra power for the iBGP problems, we did not 
leverage the structure of the safe sets in any way. In this section we get rid of the requirement on G, 
but show that if the safe sets have an extremely nice structure then Constrained Connectivity- 
SUM can actually be solved optimally in polynomial time. In the hierarchical and symmetric safe 
set version of Constrained Connectivity-Sum, S{x, y) = S{y, x) for all x,y £ V and if some 
node z G S{x,y) then S{x,z) C S{x,y) and S{z,y) C S{x,y). We show that a simple greedy 
algorithm solves this version optimally. 

We say that a pair {x,y} is an easy pair if there is some node z G S{x,y) such that S{x,z) C 
S{x,y) and S{y,z) C S{x,y). The pair {x,y} is hard otherwise. Note that in a hard pair {x,y}, 
every node z in S{x, y) has either S{x, z) = S{x, y) or S{y, x) = S{x, y) by the hierarchy property. 

Lemma 6.1. Let G he a graph that has a safe path for all hard pairs. Then all easy pairs also have 
a safe path in G, i.e. G is a feasible solution. 

Proof. We prove that every pair {x, y} has a safe path in G by induction on the size of safe sets. For 
the base case, all pairs {x, y} with \S{x, y)\ =2 are hard, so by assumption they have a safe path in 
G. For the inductive step, suppose that there are safe paths for all pairs {n, f} with \S{u.,v)\ < k, 
and let {x, y} be a pair with \S{x, y)\ = k. If {x, y} is hard then by assumption there is a safe path. 
If it is easy, then there is some node z S S{x,y) such that S{x,z) C S{x,y) and S{y,z) C S{x,y). 
Since these two subsets are strictly smaller, by induction there is an x — z path contained in 
S{x,z) C S{x,y) and there is a z — y path contained in S{y,z) C S{x,y). Concatenating these 
paths give an x — y path contained in S{x, y). D 

This lemma means that we don't have to worry about satisfying easy pairs, just hard ones. We 
now prove a few structural lemmas that will be useful when designing an algorithm. 

Lemma 6.2. Let {x, y} be a hard pair. Then S{u, v) C S{x, y) for all u,v G S{x, y). 

Proof. Since {x, y} is hard either S{u,x) = S{x,y) or S{u,y) = S{x,y). Without loss of generality 
we assume that S{u,x) = S{x,y). This implies that v € S{u,x), so by the hierarchy property we 
know that S{u,v) C S{u,x) = S{x,y). D 
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This clearly implies that if G is a feasible solution and {x,y} is a hard pair then Glgr^^y) is 
connected and all pairs u,v G S{x,y) have a safe path contained in S{x,y). We now prove some 
lemmas about the structure of the optimal solution. 

Lemma 6.3. Every edge {x, y} G OPT is a hard pair. 

Proof. Suppose {x, y} is an edge in OPT that is an easy pair. Then there is some z E S{x, y) such 
that S{x,z) C S{x,y) and S{y,z) C S{x,y). Note that y S{x,z), since if it was then by the 
hierarchy property we would have that S{x,y) C S{x,z), so S{x,z) = S{x,y) contradicting {x,y} 
being an easy pair. Similarly, we know that x S{y,z). Since OPT is feasible there is an x — z 
path in S{x,z) C S{x,y) and a z — y path in S{y,z) C S{x,y), and by the previous observation 
neither of them use the {x, y} edge. So there is an x — y safe path in OPT that does not use the 
{x, y} edge. Any hard pair {u, v} that use the {x, y} edge in a safe path can just use the path we 
found through z, since by Lemma |6.2| S{x,y) C S{u,v). Thus if we remove {x,y} all of the hard 



pairs still have a safe path, so by Lemma 6.1 so do all of the easy pairs. This contradicts OPT 



being optimal. D 

Order all hard pairs in nondecreasing order, breaking ties arbitrarily. We say {a, b} < {c, d} if 
{a, b} comes before {c, d} in this ordering. We partition the edges of OPT as follows. Let e = {u, v} 
be an edge in OPT, and let {x, y} be the first hard pair in the ordering such that u G 5(x, y) and 
V G S{x, y), and assign e to part OPT^^ yy By Lemma ^^ all edges in OPT are hard pairs so this is 
a valid partition. Let OPT^^^yj = U ^a,b}<{x.y}0 PT^a,b} i ^^^^ l^t OPT^^x,y} be defined analogously. 

Lemma 6.4. Let {x,y} be a hard pair. Then OPT^r^^yy\gr^^y\ is connected. 



Proof. Let {u,v} be an edge in OPT\g(^^^yy Then since {u,v} is a hard pair (by Lemma 5.3 ) 
and {x,y} is a hard pair with both u and v in S{x,y), by the definition of the partition the part 
OPT^a,b} containing {u,v} must have {a,b} < {x,y}. Thus {u,v} G OPT<^^x,y}\s{x,y)- ^ 

We now finally give our algorithm. First we construct the above ordering. We then consider hard 
pairs in this order, and when considering a pair {x, y} we add the minimum number of edges required 
to make our current graph restricted to S{x, y) connected. This algorithm clearly returns a feasible 
solution, since for any hard pair {x, y} at some point we consider it and make sure that its safe set is 
connected and that is sufficient by Lemma |6T| . For every hard pair {x, y}, let ALGrx,y} by the edges 
added by the algorithm when considering {x,y}, and define ALG^^^yj = ^{a,b}<{x,y}^^G ^a,b} ^^^ 
ALG<:^xyy analogously. Now we will prove that |ALG| < \OPT\. 

Lemma 6.5. The endpoints of any edge in OPT^!x^y-j\s(x,y) o'^e connected in ALG^Sx,y}\s{x,y)- 

Proof. Let {u,v} be an edge in OPT^^^yj\g(^^yy Then {u,v} G OPT^a,b} ^^v some {a, 6} < {x,y}. 
By definition, this means that {a, 6} is the first pair in the ordering with a safe set that contains 
both u and v. By Lemma B^ we know that S{u,v) C S{a,b). We also know that {u,v} is a 



hard pair by Lemma |6.3| , so if S{u, v) C S{a, b) then {n, v} would be before {a, 6} in the ordering 
and would contain both u and v, contradicting the definition of {a,b}. Thus S{u,v) = S{a,b). 
After considering {a, 6} the algorithm guarantees that ALG<[a,b}\s{a,b) is connected, and therefore 



there is a safe u — v path in ALG after considering {a,b}. We also know from Lemma 3.2 that 
S{u,v) C S{x,y), so this safe path is entirely present in ALG^^x^yy\s(x,y) ^^(^ thus u and v are 
connected in ALG<{a; j/}|5(a; J/). D 
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Theorem 6.6. \ALG\ < \OPT\ 

Proof. We will prove that {ALGs^mI — \OPT!,j.^y-j\ for all hard pairs {x,y}. Since these form 
a partition of the edges of ALG and of OPT, this is sufficient to prove that |ALG| < \OPT\. 



Consider some such hard pair {x, y}. We know from Lemma 3.4 that OPT<^(^y'i\suy\ is connected, 
so OPTf^y-i must contain enough edges to connect the components of OPT^i-^y'ilsuyy By the 
definition of the algorithm, ALGsx,y} has the minimum number of edges necessary to connect the 
components of ALG^^x,y}\s{x,y)- Now since the number of components in ALG^^x,y}\s{x,y) is at 
most the number of components of OPT^rx,y}\s(x,y) (by Lemma 6.5), this implies that |^LGr^^j^}| < 



\OPT^,,y}\. D 
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